Audiovisual perceptual evaluation of resynthesised speech movements
نویسندگان
چکیده
We have already presented a system that can track the 3D speech movements of a speaker’s face in a monocular video sequence. For that purpose, speaker-specific models of the face have been built, including a 3D shape model and several appearance models. In this paper, speech movements estimated using this system are perceptually evaluated. These movements are re-synthesised using a Point-Light (PL) rendering. They are paired with original audio signals degraded with white noise at several SNR. We study how much such PL movements enhance the identification of logatoms, and also to what extent they influence the perception of incongruent audio-visual logatoms. In a first experiment, the PL rendering is evaluated per se. Results seem to confirm other previous studies: though less efficient than actual video, PL speech enhances intelligibility and can reproduce the McGurk effect. In the second experiment, the movements have been estimated with our tracking framework with various appearance models. No salient differences are revealed between the performances of the appearance models.
منابع مشابه
Vision of tongue movements bias auditory speech perception.
Audiovisual speech perception is likely based on the association between auditory and visual information into stable audiovisual maps. Conflicting audiovisual inputs generate perceptual illusions such as the McGurk effect. Audiovisual mismatch effects could be either driven by the detection of violations in the standard audiovisual statistics or via the sensorimotor reconstruction of the distal...
متن کاملAudiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or i...
متن کاملInteraction of visual cues for prominence
The timing of both eyebrow and head movements of a talking face was varied systematically in a test sentence using an audiovisual speech synthesizer. The audio speech signal was unchanged over all sentences. 33 listeners were given the task of identifying the most prominent word in the test sentence. Results indicate that both eyebrow and head movements are powerful visual cues for prominence a...
متن کاملTowards a lexical fuzzy logical model of perception: the time-course of audiovisual speech processing in word identification
This study investigates the time-course of information processing in both visual as well as in the auditory speech as used for word identification in face-to-face communication. It extends the limited previous research on this topic and provides a valuable database for future research in audiovisual speech perception. An evaluation of models of speech perception by ear and eye in their ability ...
متن کاملTeaching and learning guide for audiovisual speech perception: A new approach and implications for clinical populations
When a speaker talks, the visible consequences of what they are saying can be seen. This auditory (the speech sound) and visual (movements of the lips and other articulators), or AV speech influences what listeners hear both in noisy listening environments and when auditory speech can easily be heard. Thought to be a cross‐cultural phenomenon that emerges early in typical language development, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004